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Abstract 

In this paper a new result of recovery of sparse vectors from deterministic 
and noisy measurements by t\ minimization is given. The sparse vector is 
randomly chosen and follows a generic p-sparse model introduced by Can- 
des and al. The main theorem ensures consistency of £\ minimization 
with high probability. This first result is secondly extended to compressible 
vectors. 
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Introduction 

Let A be a real matrix with n rows and m columns with m > n. Let 
x° be a sparse vector following a generic p-sparse model and let y be a data 
vector y = Ax° + b, where b is a noise vector. The question we want to 
address is: can we give a bound on the sparsity of x° ensuring x° can be 
recovered or estimated from t\ minimization with high probability ? 

Candes and Plan |l| answer partially to this question under assumptions 
on the coherence of the matrix A, with a random (Gaussian) noise and with 
hypotheses on the minimum absolute value of non-zero components of x°. 
They proved that with high probability the support and the sign of x° can 
be recovered using t\ minimization if x° is sparse enough. 
In this paper we show that, under the same assumption on the coherence 
and sparsity, with a bounded noise, without any assumption on the mini- 
mum absolute value of x°, l\ minimization provides a vector x* such that 
be bounded. Moreover this new result can be extended to 
compressible vectors that are close to sparse vectors. 

Since an explicite formulation of x* is impossible without the minimum value 
assumption, different tools must be developed. 

In a first part, notations and definitions are given. In a second part the 
contributions of the paper are developed and connected to prior works. In 
a third part the proof of main results are given. A last part is devoted to 
discussion. 



1. Notations and Definitions 

Let us recall the definition of the subgradient of the l\ norm at a point 
x which support is / : 

d\\x\\ x = {£, such that HfH^ < 1, V* € = sign(x(i))}. 

The Bregman distance is defined as follows: 

Definition 1. Let x 1 , x be two vectors ofW 71 . For all £ 6 5||x 1 |L ; the 
Bregman distance between x and x 1 is defined by 

D^(x,x )) = | [ £c [| x ~~ || xl |li ~~ ~~ x1 )- 

The generic p-sparse model is defined by Candes and Plan [l| as follows 

Definition 2. A vector x follows the generic p-sparse model if the support 
of x is randomly chosen with equiprobability from all supports which cardinal 
is p and if its sign is randomly chosen with equiprobability from all possible 
sign vectors . 

Definition 3. For a matrix B the norm ||-B|| is defined as follows : 

\Bx\ 



\B\\ v ^ a = sup^^ and ||B||„ = ||B 



p->g ii up H Hp-s-p 
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For a matrix B such that B t B is invertible, B + = (B t B)~ 1 B t denotes the 
Moore- Penrose pseudoinverse of B. 

Recall that ||-B + || 2 = \/\\ (B t B)^ 1 1| 2 and that ||-B|| 1 _ i> 2 = m ?' x ll^ll2 where 

i 

(bi)i are the columns of B. 

Definition 4. A n x m matrix A whose columns are normalized is said to 
satisfy the A$ — coherence criterion if 



fjlj 1 w ' " 1 lnm 



fj.(A) = m&x\(aj,ai)\ ^ — — (1) 



where Aq is non negative real number. 

For a given vector x° G M m which support is /, Aj denotes the submatrix 
of A which columns are columns of A indexed by /. For a given vector x, 
xi denotes the subvector which components are components of x indexed by 
/. The vector sign (x) is the vector whose component indexed by % is 1 if 
x(i) > 0, and —1 if x(i) < and if x(i) = 0. If the columns (oj)ig/ are 
linearly indepedent, the matrix A\Aj is invertible and for any x° such that 
Supp(x°) = I, one can define 



d(x°) = ^/(^^/)- 1 sign (x?) and IC(x°) = max|a*-d( 

30 



X°) 



This Identification Coefficient (IC) can be seen as signed ERC (Exact Re- 
covery Coefficient introduced by Tropp 

0,3)- The condition IC{x°) < 1, 
see Fuchs 0|, is a sufficient condition for exact recovery by l\ minimization. 



2. Contributions and relations with prior works 



Suppose y = Ax° + b with [|6[L ^ e and define the minimization problem 
min 11x11-1 under the constraint \\Ax — v\\ ^ e. (2) 

Let x* be a minimizer of ([2|). 
The following Theorem holds. 

Theorem 1. Let A be anxm matrix satisfying the Aq — coherence criterion 
with Aq small enough, suppose that x° follows the generic p-sparse model, 
with p ^ ii^y^ra ' f or c ° sma tt enough depending on Aq. Suppose \\b\\ 2 ^ £ 
and y = Ax° + b, then any solution x* of ([2|) satisfies 

||x°-x*|| 2 < Ce (3) 

with 

with probability greater than 1 — 4m _21n2 if m is large enough. 
It turns out that 1 1 x* — x° 1 1 1 can also be bounded using a similar proof : 



(4) 

3 V ' 



e 



p'-x^ < -(lV2p + 16p), (5) 

with the same probability. The proof of this inequality requires a simple 
modification of Proposition [TJ This extension may be interesting since vec- 
tors a; and x* belong the a space which dimension is m much larger than p. 
Applying the theorem to x* = x 1 and b\ = Ar + b one obtains the following 
corollary : 

Corollary 1. Suppose x 1 = X s + r where ||r|| 2 ?S C\E\ and x s follows the 
generic p-sparse model, with p ^ hi m ' ^ or c ° sma ^ enou gh and if A 
satisfies the Aq — coherence criterion with Aq small enough. Suppose \\b\\ 2 ^ 
£i then any solution x* of (J2]) with y = Ax 1 + b and e = £\{1 + C\ ||-A|| 2 ) 
satisfies 

\\x s -x*\\ 2 < Ce (6) 
with _ 

C = 2V-2 + S(2 + f^ . (7) 

with probability greater than 1 — 4m~ 21n2 if m is large enough. 

To prove the corollary, one can apply the Theorem with x° = x s and 
b x = Ar + b. 

This result sheds a new light on the understanding of the success of l\ mini- 
mization of the recovery of sparse and compressible vectors from noisy deter- 
ministic measurements. No Restricted Isometry Properties (RIP) (a, |fj( can 
be used here. The geometry of polytopes associated to A (see Donoho 
seems hard to use and the classical bound derived by the coherence or the 
ERC are too weak. In [l| authors propose an approach with a random 
model on the vector x°. This work lies on concentration lemmas of singu- 
lar values of submatrices due to Tropp and on a explicit formulation of the 



solution of l\ minimization, see Fuchs [8|. This approach ensures the exact 
recovery of the support and the sign of the solution and needs conditions on 
the signal to noise ratio, decorrelation between the noise and the matrix and 
consequently can not be easily extended to compressible vector. 
The present article focuses on the £2 reconstruction error. In this new set- 
ting no signal to noise ratio, no independence between matrix and noise 
are needed and the result can be easily extended to compressible vectors. 
However this new approach doesn't give any informations on the support 
of the solution. Unlike [H, the bound holds for any vector x 1 satisfying 
1 1 y — Ax 1 1 1 2 ^ e and 1 1 x 1 1 1 , ^ 1 1 x° 1 1 1 . 

Following Grasmair et al. [9|, our approach uses Bregman distance to bound 
the part of the £\ norm of x* that is not supported on the support I of a; . 



3. Proof of Theorem Q3 

3.1. Proof of Theorem^ 

The proof lies on two properties, the first one bounds the £2 error x° — x* 
under the hypothesis that IC(x°) < 1 

Proposition 1. Let x° £ R m , whose support is I. If IC(x°) < 1, then for 
any x* solution of ([2]), the following inequality holds 



\x* — x°\\ 



(8) 

The second proposition ensures that if x° follows the p— sparse model for p 
small enough then with high probability IC(x°) < \ and || [A^Aj]^ 1 || ^ 2 

Proposition 2. Suppose x° follows the generic p-sparse model with p ^ 
■rffip — for Co small enough 

1 1 J\ 1 1 2 ill Tfh 

P ((||(A|A/) _1 || 2 < 2) n (IC(x°) < t)) > l-2mexp (-^) -2ra^ . 

(9) 

Choosing Co small enough in Proposition [2] yields 

p (Jic(x°) < n (IK^r'L < 2 ))> 1 - 4 ^ 21n2 - ( 10 ) 

Moreover 

||d(s°)||* = (sign (xj) , (A* A/^sign (x})) ^ A^^p. (11) 

It can be noticed that for any support /, if the columns of A are normalized, 
IIAHIi->2 = 1- 

Applying Proposition [1] to x , it follows that with probability greater 
than 1 - 4m" 21n2 , 

{^-xX^e^+ ^ + f^ y (12) 

which concludes the proof of Theorem [TJ 



3. 2. Proof of proposition [7] 

The proof of Proposition [T] follows the one of Grasmair et al. in Q using 
the fact that, under the assumption IC(xP) < 1, s = A t d(x°) € d j | j | x - 

||<y* ^011 <^ I It* T^ll -L- I It"* II 

II 112 I I -^112 II J c ll2 

< WA+A^-x^ + \\x* Ib \\i 

^ ||A+|| 2 WA^-x^ + WxUW 
||A+|| 2 (2 e +||A / exJ c y + ||a;}c|| 1 

< ||A+|| 2 (26+11^11^ 11^110 + ||a$c||l 

^2 £ ||A+|| 2 + (||A+|| 2 ||A I c[| 1 _ >2 + 1)11^11! 

Using the Bregman distance, | be bounded : indeed, from the 

definition of s = A t d(x°) it follows that 

D s (x\x°) = \\x% - Wx ^ - ( 8 ,x* - x°) 
= Iklli " 

iei j$i 

> J2( sign (4) - s j) x *j- 

00 

Since Vj £ I, \sj\ ^ IC(x°), one gets 

L> s (x*,x°) > ^(1 - /C(x°))sign (4) x) = (1 - /C(x )) [|x} c || a , 

w 

that is, 

II < D s {x*,x ) 

ll^lli^ f^Tc^oy 

Consequently, 

c*, x° 
IC(x~ )' 



- x°|| 2 < 2. || A+|| 2 + (||^ + || 2 ||^c||^ 2 + l) °'} X *£j y (13) 



The Bregman distance can be bounded as follows. Since IC(xq) < 1, s 
A t d(x°) G J I a? j I x - Consequently 
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- (yl*ci(x ),x* -x° 




-(A*d(x°),x* - 


x°) 




-{d(x°),A(x* - 


x )) 




\\d(x°)\\ 2 \\A{x* 


--°)ll 2 




\\d(x°)\\ 2 (\\Ax* 





< 2 ||(i(x )|Le. 



The fact that \\Aj || 2 = w 1 || 2 concludes the proof of Proposition 

m □ 



3. 3. Proof of Proposition @ 

The proof relies on a proposition due to Tropp (lo| (see also 

Proposition 3. Suppose that the set I is randomly and uniformely chosen 
among sets of cardinal p with p ^ ttttttt- Then for q = 2 mm, 



E^A\A! - Id\\ q 2 )-« ^ 30/x(A) mm + 13 J 2p ^ A ^ lnm (14) 



m 



E(max|Uia,-Ho)* < 4u(^)Vhim + J ^Mlk . (15) 



and 



From this proposition and Markov inequality, Candes and Plan [1| proved 
the following Corollary : 

Corollary 2. Suppose that A satisfies the A^-coherence criterion and that 
x° follows the generic p-sparse model withp Sj 'fy™ — and 30^4o + 13\/2co 



\\A\\^nm 

\. Then A^jAj is invertible with probability greater than 1 — m. _21n2 and 

\\(A t I A I )- 1 \\ 2 ^2. (16) 

From the Proposition [3] and the Hoeffding inequality the following Lemma 
can be deduced (see Candes and Plan |l|): 

Lemma 1. Suppose x follows the generic p-sparse model and let (Wj)j e j be 
a collection of deterministic vectors. For Zq = max|(W,-, sign (xj))\ one has 

t 2 

P(Z > t) < 2\J\e 5^ 

for k > max ||W,-|L. 
jeJ 11 J " 2 

Applying Lemma [1] and Proposition [3] the proof of Proposition Q] can be 
achieved : 

For all j $l I, define Wj = a t jAf(A t jAf)~ 1 , where / is the support of x°. 
Applying Lemma [U one gets 

t 2 t 2 

P(IC(x) ^ t) < 2|I c |e~^ < 2me (17) 

We need to estimate the maximum of ||Wj|L. Using Corollary [2] one gets 

\\Wj\\ 2 = || (A\Ai)- x A\aj || 2 < H^Aj)- 1 !^ ||Aja,-|| 2 ^2\A\ aj \ 2 (18) 

with a probability greater than 1 — m~ 21n2 . Proposition [3] and Markov 
inequality is then used to estimate -AjOj L. 

£(max [[jljOjIlg) 



P (max ||A^-|| > -£L= J < ^ 



(lnm) 

co 



c \ 9 /4A + v^ 



V Vi n 



im 



If Aq and Co are small enough, ■== — ^ — == and using q = 2 In m 

Vlnm 2 ylnm 

it follows 

P fmax \\A\aAL > -^=] < mT 21n2 . (19) 
From (HI]) and ((THJ) it follows 

P fmax||^|| 2 > -^L=) < 2m~ 21n2 . (20) 
V vlnm/ 

Combined with inequality (|17p . this last inequality concludes the proof of 
the proposition. □ 



4. Discussions 

The two constants -r and 2 in the proof of Theorem Q] are arbitrary chosen. 
If others bounds are chosen, the optimal value of Cq changes and the value of 
C in Theorem [1] may change. It turns out that these values are numerically 
pessimistic and that their optimization would not be useful. Two relevant 
questions may be asked about Theorem Q] : 

Can we expect better bounds on the sparsity using the criterion IC < 1? 

Constants may be optimized but it seems that the asymptotic of the spar- 



sity may not be improved. In 11] and 12] authors proved that for gaussian 
measurements, beyond sparsity 21 m , with high probability, IC(x°) > 1. It 
would be surprising that better results could be achieved by deterministic 
measurements. 

However the Grasmair approach (see Proposition Q]) applies also to any 
vector r\ in the subgradient of the l\ norm at the point x° not only to 
s = A t d(x°). It may possible to improve the sparisty bound using another 
vector rj. 

The second question is about the y/p scaling in the bound Q. Is this 
scaling optimal or not ? Can we expect a better bound ? 

RIP Theory gives similar bounds where the constant C in ([¥]) does not de- 
pend on the sparsity p but on RIP constants that can be uniformely bounded 
if the vector is sparse enough. Moreover Fuchs [8] proved that when the noise 
is small enough and if IC(x ) < 1 the support of x* is equal to the support 
of x°, that is ||a;j c |L = 0. Looking at the proof of Proposition [TJ it appears 
that if x\ c = 0, in Theorem [1] the constant C can be set to 2^/2 which do 
not depend on p. Unfortunatly if no assumptions are made on e, there is 
no guarantee that xjc = 0. RIP theory solves the problem ensuring that all 
submatrices with a small number of columns have a good behaviour. Such 
hypothesis can not be done here and for some noise vectors b it may happen 
that || Xj c II 1 ^ 0. If I* denotes the support of a;*, the solution x* satisfies the 
following implicit equation (see 0|) 

x% = {A^A^A^y - A^AjO-W (*/*) (21) 

where A depends on e. This expression shows that the stability of the solu- 
tion depends widely on the matrix (A^+Ai*) -1 wich depends on x° and on 
the noise b. In practice in many situations I* (£ I and there is no simple way 



to control (A^+Ai*) -1 . 

The scaling yfp may be the price to pay of the lack of control on this 
matrix. 

5. Conclusion 

These results complete the previous one of Candes and Plan [1[ and 
ensures that under the same hypothesis of sparsity t\ minimization is robust 
to noise and compressibility even if the exact support and sign can not be 
recovered. To controle the part of the solution that is not supported on 
the support I of the objective vector x°, no RIP can be used here but the 
Bregman distance provides a interesting bound. 

References 

[1] E. J. Candes, Y. Plan, Near-ideal model selection by l\ minimization, 
Annals of Statistics 37 (5A) (2009) 2145-2177. 

[2] J. A. Tropp, Just relax: convex programming methods for identifying 
sparse signals in noise, IEEE Trans. Info. Theory 52 (3) (2006) 1030- 
1051. 

[3] J. A. Tropp, Greed is good: algorithmic results for sparse approxima- 
tion, IEEE Trans. Info. Theory 50 (10) (2004) 2231-2242. 

[4] J. -J. Fuchs, On sparse representations in arbitrary redundant bases, 
IEEE Trans. Info. Theory 50 (6) (2004) 1341-1344. 

[5] E. J. Candes, The restricted isometry property and its implications for 
compressed sensing, Compte Rendus de l'Academie des Sciences, Paris, 
Serie I 346 (2008) 589-592. 

[6] E. Candes, T. Tao, Near-optimal signal recovery from random projec- 
tions: Universal encoding strategies?, IEEE Trans. Info. Theory 52 (12) 
(2006) 5406-5425. 

[7] D. L. Donoho, High-dimensional centrally symmetric polytopes with 
neighborliness proportional to dimension, Discrete & Computational 
Geometry 35 (4) (2006) 617-652. 

[8] J. Fuchs, Recovery of exact sparse representations in the presence of 
bounded noise, IEEE Trans. Info. Theory 51 (10) (2005) 3601-3608. 

[9] M. Grasmair, O. Scherzer, M. Haltmeier, Necessary and sufficient con- 
ditions for linear convergence of 11 regularization, Communications on 
Pure and Applied Mathematics 64 (2011) 161-182. 

[10] J. A. Tropp, Norms of random submatrices and sparse approximation, 
C. R. Math. Acad. Sci. 346 (2008) 1271-1274. 



[11] C. Dossal, M. Chabanol, G. Peyre, J. Fadili, 



Sharp support recovery from noisy random measurements by 11 minimization 



Applied and Computational Harmonic Analysis 33 (1) (2012) 24-43. 
| doi : 10 . 1016/ j . acha . 2011 . 09 . 003} 



URL http : //hal . archives -ouvertes . fr/hal- 00553670/ 



[12] M.J. Wainwright, Sharp thresholds for high-dimensional and noisy spar- 
sity recovery using ^-constrained quadratic programming (lasso), IEEE 
Trans. Info. Theory 55 (5) (2009) 2183-2202. 



